Visualizing statistics associated with hierarchical information can be difficult. In the case of organism abundance data calssified by a taxonomy, stacked barcharts or pie graphs are typically used to view a single rank in the taxonomy (e.g. phylum). The use of color to represent different taxa in these visulizations limits the number of taxa displayed to the number of colors that can be easily discerned. Metacoder provides an alternative visulization we call heat trees to plot this type of data.
Although there are many options that can be used to make highly customized graphs, heat_tree only needs one argument to function: an object of type taxmap. We can see the default appearance of a subset of the UNITE database (Kõljalg et al. 2005) using the code below:
library(metacoder)
unite_ex_data_3
## `taxmap` object with data for 703 taxa and 500 observations:
##
## ------------------------------- taxa -------------------------------
## 1, 2, 3, 4, 5, 6, 7 ... 697, 698, 699, 700, 701, 702, 703
##
## ---------------------------- taxon_data ----------------------------
## # A tibble: 703 × 4
## taxon_ids supertaxon_ids unite_rank name
## <chr> <chr> <chr> <chr>
## 1 1 <NA> k Fungi
## 2 2 1 p Ascomycota
## 3 3 1 p Basidiomycota
## 4 4 1 p Chytridiomycota
## 5 5 1 p Glomeromycota
## 6 6 1 p unidentified
## 7 7 1 p Zygomycota
## # ... with 696 more rows
##
## ----------------------------- obs_data -----------------------------
## # A tibble: 500 × 5
## obs_taxon_ids seq_name seq_id other_id
## <chr> <chr> <chr> <chr>
## 1 183 Lachnum_sp JQ347180 SH189775.06FU
## 2 175 Lachnellula_calyciformis U59145 SH189776.06FU
## 3 183 Lachnum_sp AM084756 SH189777.06FU
## 4 183 Lachnum_sp FM172814 SH189778.06FU
## 5 183 Lachnum_sp FN539058 SH189779.06FU
## 6 181 Lachnum_pulverulentum AB481260 SH189780.06FU
## 7 183 Lachnum_sp HQ211694 SH189781.06FU
## # ... with 493 more rows, and 1 more variables: sequence <chr>
##
## --------------------------- taxon_funcs ---------------------------
## n_obs, n_obs_1, n_supertaxa, n_subtaxa, n_subtaxa_1, hierarchies
heat_tree(unite_ex_data_3)
Each node (i.e. circle) in the graph represents a taxon and each line represents its membership in a taxon of a corser taxonomic rank.
The size of nodes and edges can be scaled to any number associated with each taxon using the node_size and edge_size parameters. Below, the number of sequences for each taxon is used to determine node size.
heat_tree(unite_ex_data_3,
node_size = n_obs)
Note that it was not necessary to specify the absolute node size; the range of absolute node sizes is optimized for each graph so as to minimize overlap of nodes and maximize the ranges of sizes. The argument overlap_avoidance is used to determine how much overlaps are avoided. Higher values mean more importance is given to avoiding overlapping nodes than to maximizing the ranges of sizes. A high overlap_avoidance makes the connections between taxa more clear, but diminishes the visual effect of node size. Too low of an overlap_avoidance can make the graph hard to read by allowing nodes to overlap more.
heat_tree(unite_ex_data_3,
node_size = n_obs,
overlap_avoidance = 10)
heat_tree(unite_ex_data_3,
node_size = n_obs,
overlap_avoidance = 0.1)
The node_color argument works in a similar way to node_size. Numeric values are translated to a range of colors. Below the abundance of samples for each taxon is used to determine color instead of size. The range of color used can be set using the node_color_range argument. This argument take a list of colors in the form of names, hex color codes, or integers.
heat_tree(unite_ex_data_3,
node_size = n_obs,
node_color = n_obs)
heat_tree(unite_ex_data_3,
node_size = n_obs,
node_color = n_obs,
node_color_range = c("#FFFFFF", "darkorange3", "#4e567d", "gold"))
Like node_size, the color of lines can be set independently of nodes, although the default behavior is for the lines to have the same color as the nodes. To only color nodes, you can set the lines to be a constant color or vise-versa.
heat_tree(unite_ex_data_3,
node_size = n_obs,
node_color = n_obs,
edge_color = "grey")
heat_tree(unite_ex_data_3,
node_size = n_obs,
node_color = "grey",
edge_color = n_obs)
You can also set the color palette used for the lines in the same way as you set it for the node using the argument edge_color_range.
Labels can be added to nodes using the node_label option:
heat_tree(unite_ex_data_3,
node_size = n_obs,
node_color = n_obs,
node_label = name)
Label sizes are proportional to node size by default. By default, only a maximum number of labels are printed to avoid excessive crowding. The maximum number of labels that will be printed is controlled by the node_label_max option:
heat_tree(unite_ex_data_3,
node_size = n_obs,
node_color = n_obs,
node_label = name,
node_label_max = 5)
heat_tree(unite_ex_data_3,
node_size = n_obs,
node_color = n_obs,
node_label = name,
node_label_max = 200)
Note that the labels are a special kind that scales with the size of the graph. This means that the text size will always be proportional to the graph size regardless of how big the graph is rendered; however, these special labels take more time to render, so causing too many to be printed drastically slow the rendering of the graph or even cause errors.
Lines can be labeled as well using the edge_label option, which works similarly to the node_label option:
heat_tree(unite_ex_data_3,
node_size = n_obs,
node_color = n_obs,
edge_label = name)
The default background color is transparent in order to make formatting posters and slideshows as flexible as possible. Other background colors can be specified using the background_color option:
heat_tree(unite_ex_data_3,
node_size = n_obs,
node_color = n_obs,
background_color = "grey")
Plots can be saved using ggsave from the ggplot2 package or using the output_file option:
my_plot <- heat_tree(unite_ex_data_3,
node_size = n_obs,
node_color = n_obs)
ggplot2::ggsave("path/to/my/output.png", my_plot, bg = "transparent")
heat_tree(unite_ex_data_3,
node_size = n_obs,
node_color = n_obs,
output_file = "path/to/my/output.png")
Sometimes a taxonomy has multiple roots. This occurs when there is not a common taxon all observations are assigned to, like “Eukaryota”, if all your observations are associated with eukayotes. metacoder plots taxonomies with multiple roots as multiple trees:
heat_tree(contaminants,
node_size = n_obs,
node_color = n_obs,
node_label = name,
tree_label = name,
layout = "fruchterman-reingold")
To see the long list of available plotting options, type ?heat_tree. Although there are many options (perhaps an overwhleming amount at first glance), most of the options fall into groups that work in similar ways.
sessionInfo()
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.2 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] metacoder_0.1.2 knitcitations_1.0.7 knitr_1.14
##
## loaded via a namespace (and not attached):
## [1] igraph_1.0.1 Rcpp_0.12.9 magrittr_1.5 munsell_0.4.3
## [5] colorspace_1.2-7 R6_2.2.0 bibtex_0.4.0 stringr_1.1.0
## [9] httr_1.2.1 plyr_1.8.4 dplyr_0.5.0 tools_3.3.1
## [13] gtable_0.2.0 DBI_0.5-1 htmltools_0.3.5 lazyeval_0.2.0
## [17] assertthat_0.1 yaml_2.1.13 rprojroot_1.2 digest_0.6.12
## [21] tibble_1.2 RJSONIO_1.3-0 ggplot2_2.2.1 reshape2_1.4.2
## [25] RefManageR_0.13.1 formatR_1.4 bitops_1.0-6 RCurl_1.95-4.8
## [29] evaluate_0.10 rmarkdown_1.3 labeling_0.3 stringi_1.1.2
## [33] scales_0.4.1 backports_1.0.5 XML_3.98-1.4 lubridate_1.6.0
Kõljalg, Urmas, Karl-Henrik Larsson, Kessy Abarenkov, R Henrik Nilsson, Ian J Alexander, Ursula Eberhardt, Susanne Erland, et al. 2005. “UNITE: A Database Providing Web-Based Methods for the Molecular Identification of Ectomycorrhizal Fungi.” New Phytologist 166 (3). Wiley Online Library: 1063–8.
Comments